Variational Inference for Policy Gradient

نویسنده

  • Tianbing Xu
چکیده

Inspired by the seminal work on Stein Variational Inference [2] and Stein Variational Policy Gradient [3], we derived a method to generate samples from the posterior variational parameter distribution by explicitly minimizing the KL divergence to match the target distribution in an amortize fashion. Consequently, we applied this varational inference technique into vanilla policy gradient, TRPO and PPO with Bayesian Neural Network parameterizations for reinforcement learning problems. 1 Parametric Minimization of KL Divergence Suppose we have a random sample from a base distribution ξ ∼ q0(ξ), e.g. q0 = N (0, I), we are able to generate an induced distribution qφ(θ) by the general invertible and differentiable transformation θ = hφ(ξ) (see Appendix A). Our goal is to regard qφ(θ) as a variational distribution to match the true distribution p(θ) such that J = KL(qφ(θ)||p(θ)) is minimized. Lemma 1. H(q) = H(q0) + Eξ∼q0 ( log det ( ∂hφ(ξ) ∂ξ )) (1) with (1), we can have the following identity for KL(qφ(θ)||p(θ)) : KL(q||p) = −H(q)− Eq(θ)(log p(θ)) = −H(q0)− Eξ∼q0 ( log det ( ∂hφ(ξ) ∂ξ )) − Eξ∼q0(log p(hφ(ξ))) = −H(q0)− Eξ∼q0 ( log det ( ∂hφ(ξ) ∂ξ )

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

In this paper we discuss very preliminary work on how we can reduce the variance in black box variational inference based on a framework that combines Monte Carlo with exhaustive search. We also discuss how Monte Carlo and exhaustive search can be combined to deal with infinite dimensional discrete spaces. Our method builds upon and extends a recently proposed algorithm that constructs stochast...

متن کامل

Stein Variational Policy Gradient

Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian in...

متن کامل

Stochastic Variational Inference with Gradient Linearization

Variational inference has experienced a recent surge in popularity owing to stochastic approaches, which have yielded practical tools for a wide range of model classes. A key benefit is that stochastic variational inference obviates the tedious process of deriving analytical expressions for closed-form variable updates. Instead, one simply needs to derive the gradient of the log-posterior, whic...

متن کامل

Natural Gradients via the Variational Predictive Distribution

Variational inference transforms posterior inference into parametric optimization thereby enabling the use of latent variable models where it would otherwise be impractical. However, variational inference can be finicky when different variational parameters control variables that are strongly correlated under the model. Traditional natural gradients that use the variational approximation fail t...

متن کامل

Two Methods for Wild Variational Inference

Variational inference provides a powerful tool for approximate probabilistic inference on complex, structured models. Typical variational inference methods, however, require to use inference networks with computationally tractable probability density functions. This largely limits the design and implementation of variational inference methods. We consider wild variational inference methods that...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.07833  شماره 

صفحات  -

تاریخ انتشار 2018